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Abstract 

Suppose that we wish to estimate a vector x S C” from a small number of noisy linear 
measurements of the form y = Ax + z, where z represents measurement noise. When the vector 
X is sparse, meaning that it has only s nonzeros with s n, one can obtain a significantly more 
accurate estimate of x by adaptively selecting the rows of A based on the previous measurements 
provided that the signal-to-noise ratio (SNR) is sufficiently large. In this paper we consider 
the case where we wish to realize the potential of adaptivity but where the rows of A are 
subject to physical constraints. In particular, we examine the case where the rows of A are 
constrained to belong to a finite set of allowable measurement vectors. We demonstrate both 
the limitations and advantages of adaptive sensing in this constrained setting. We prove that 
for certain measurement ensembles, the benefits offered by adaptive designs fall far short of the 
improvements that are possible in the unconstrained adaptive setting. On the other hand, we 
also provide both theoretical and empirical evidence that in some scenarios adaptivity does still 
result in substantial improvements even in the constrained setting. To illustrate these potential 
gains, we propose practical algorithms for constrained adaptive sensing by exploiting connections 
to the theory of optimal experimental design and show that these algorithms exhibit promising 
performance in some representative applications. 
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1 Introduction 

Suppose that we wish to estimate a sparse vector from a small number of noisy linear measurements. 
In the setting where the measurements are selected in advance (independently of the signal) we now 
have a rich understanding of both practical algorithms and the theoretical limits on the performance 
of these algorithms. A typical result from this literature states that for a suitable measurement 
design, one can estimate a sparse vector with an accuracy that matches the minimax lower bound up 
to a constant factor [7]. Such results have had a tremendous impact in a variety of practical settings. 
In particular, they provide the mathematical foundation for “compressive sensing,” a paradigm for 
efficient sampling that has inspired a range of new sensor designs over the last decade. 
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A distinguishing feature of the standard compressive sensing paradigm is that the measurements 
are nonadaptive, meaning that a fixed set of measurements are designed and acquired without 
allowing for any possibility of adapting as the measurements begin to reveal the structure of the 
signal. While this can be attractive in the sense that it enables simpler hardware design, in the 
context of sparse estimation this also leads to some clear drawbacks. In particular, this would mean 
that even once the acquired measurements show us that portions of the signal are very likely to be 
zero, we may still expend significant effort in “measuring” these zeros! In such a case, by adaptively 
choosing the measurements, dramatic improvements may be possible. 

Inspired by this potential, recent investigations have shown that we can often acquire a sparse 
(or compressible) signal via far fewer measurements or far more accurately if we choose them adap¬ 
tively (e.g., see [ElEHESlEe]). This body of work, which will be discussed in greater detail in 
Section demonstrates that adaptive sensing indeed offers the potential for dramatic improve¬ 
ments over nonadaptive sensing in many settings. However, the existing approaches to adaptive 
sensing, which rely on being able to acquire arbitrary linear measurements, cannot be applied in 
most real-world applications where the measurements must respect certain physical constraints. In 
this paper, our focus is on constrained adaptive sensing, where our measurements are restricted to 
be chosen from a particular set of allowable measurements. We will see that new algorithms are re¬ 
quired and explore the theoretical limits within this more restrictive setting. Before describing the 
constrained adaptive setting in more detail, we first provide a brief review of existing approaches 
to nonadaptive and adaptive sensing of sparse signals. 


1.1 Nonadaptive sensing 

In the standard nonadaptive compressive sensing framework PEI ns [17], we acquire a signal x 
via the linear measurements y = Ax -|- z, where A is an m x re matrix representing the sensing 
system and z represents measurement noise. The goal is to design A so that rre is smaller than re 
by exploiting the fact that x is sparse (or nearly sparse). Given a basis T', we say that a signal 
X E C” is s-sparse if it can be represented by a linear combination of just s elements from i.e., 
we can write x = T^o:, with ||q:||o < s, where ||q:||o := |supp(a)| denotes the number of nonzeros in 
a. We will typically be interested in the case where s ^ re. 

There is now a rich literature that describes a wide range of techniques for designing an ap¬ 
propriate A and efficient algorithms for recovering x. In much of this literature, the matrix A is 
chosen via randomized constructions that are known to satisfy certain desirable properties such 
as the so-called restricted isometry property (RIP)0 Under the assumption that A satisfies the 
RIP (or that AT' satisfies the RIP in the case where T' 7 ^ I), if each entry of z is independent 
white Gaussian noise with variance then one can show that techniques based on £i-minimization 
produce an approximation x satisfying 


]E||x — x||| < C 


re log re 


sa 


( 1 ) 


where (7 > 1 is a fixed constant (e.g., see m pp. 35]). Note that this bound holds for any x, 
and hence any SNR (even the worst-case). It is possible to obtain improved bounds that eliminate 

^See Section!^ for a more detailed discussion of the RIP and its implications in the context of adaptive sensing. 
Note that the RIP is typically stated to require ||Ax||2 ~ ||x||2 for all s-sparse x, which implies a fixed scaling for 
the matrix A where ||A|||. « n. To ease the comparison with results that arise in contexts with alternative scalings, 
in the result stated in © we make no assumption on the scaling of A and merely require ||Ax||2 ~ / 3 ||x ||2 for some 
d > 0. 
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the log n factor when one assumes that the SNR is sufficiently large to ensure that the support is 
exactly recovered. 

One can show that this result is essentially optimal in the sense that there is no alternative 
method to choose A or perform the reconstruction that can do better than this (up to the precise 
value of the constant C) [7j. In the event that the signal x is not exactly s-sparse, it is also possible 
to extend these results by introducing an additional term in the error bound that measures the 
error incurred by approximating x as s-sparse. See m and references therein for further details. 

1.2 Adaptive sensing 

A defining feature of the approach described above is that it is completely nonadaptive. When 
we consider the effect of noise, this nonadaptive approach might draw some severe skepticism. To 
see why, note that in the nonadaptive scenario, most of the “sensing energy” is used to measure 
the signal at locations where there is no information, i.e., where the signal vanishes. Specifically, 
one consequence of using the randomized constructions for A typically considered in the literature, 
or alternatively, any matrix satisfying the RIP, is that the available sensing energy (i.e., ||A|||.) is 
evenly distributed across all possible indices. This is natural since a priori we do not know where 
the nonzeros may lie, however, since most of the coordinates Xj are zero, it also means that the 
vast majority of the sensing energy is seemingly wasted. In other words, by design, the sensing 
vectors are approximately orthogonal to the signal, yielding a poor signal-to-noise ratio (SNR). 

The idea behind adaptive sensing is that we should focus our sensing energy on locations where 
the signal is nonzero in order to increase the SNR, or equivalently, not waste sensing energy. In 
other words, one should try to learn as much as possible about the signal while acquiring it in 
order to design more effective subsequent measurements. Roughly speaking, one would like to 
(i) detect those entries which are nonzero or significant, (ii) progressively concentrate the sensing 
vectors on those entries, and {Hi) estimate the signal from such localized linear functionals. Such a 
strategy is employed by the compressive binary search and compressive adaptive sense and search 
strategies of m and [26]. These algorithms operate by examining successively smaller pieces of the 
signal to accurately determine the locations of signal energy. These techniques can yield dramatic 
improvements in recovery accuracy. 

To quantify the potential benefits of an adaptive scheme, suppose that we observe 


yi = (aj,x) + Zi (2) 

where the Zi are independent and identically distributed (i.i.d.) AA(0, cr^) entries and the a* are 
allowed to depend on the measurement history ((yi, ai), • • • , (yj_i, aj_i)), with the only constraint 
being that ||aj||| = ||A|||. is fixed. Consider a simple procedure that uses half of the sensing 
energy in a nonadaptive way to identify the support of an s-sparse vector x and then adapts to use 
the remaining half of the sensing energy to estimate the values of the nonzeros. If such a scheme 
identifies the correct support, then it is easy to show that this procedure can yield an estimate 
satisfying 

E||X - x||| =(3) 

If we contrast this result to that in which represents the best possible performance in the 
nonadaptive setting, we see that this simple adaptive scheme can potentially improve upon the 
nonadaptive scheme by a factor of roughly (n/s)logn, which represents a dramatic improvement 
in the typical scenario where s n. Of course, this is predicated on the assumption that the first 
stage of support identification succeeds, which is not always the case. 
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A fundamental question is thus: in practice, how much lower can the mean squared error (MSE) 
he when we are allowed to sense the signal adaptively! The answer is a subtle one. In [T], it is 
shown that there is a fixed constant C > 0 such that 

inf sup E||x — x ||2 > C scr^. (4) 

^ ||x||o<s ll-^ll_F 

In other words, for even the best possible adaptive scheme there are s-sparse vectors for which 
our recovery error is bounded below by ([1]). This lower bound improves upon the nonadaptive 
performance ([I]) by only a factor of logn, coming far short of the improvement that ([3]) indicates 
might be possible. Similar results are also obtained in m- These results are established by 
considering vectors that are so difficult to estimate that it is impossible to obtain a reliable estimate 
of their support, and so adaptive algorithms offer limited room for improvement over nonadaptive 
ones. 

The result ([1|) does not say that adaptive sensing never helps. In fact, in practice it almost 
always does help. For example, when some or most of the nonzero entries in x are only slightly 
larger than the worst-case amplitude identified in [T], we can detect them sufficiently reliably to 
enable the dramatic improvements predicted in ([3j). More concretely, provided that cr^ is not too 
larg^ relative to the nonzero entries of x, a well-designed adaptive scheme, where the aj are chosen 
sequentially as in [l5l[2^, can achieve 

E||x — x ||2 < C ^ 2 (^) 

\\M\f 

for a fixed constant C, which represents an enormous improvement when s n, and demonstrates 
that the potential benefits suggested in ([3]) can be realized in certain regimes. 

We briefly note that these results are somewhat reminiscent of classical results from the field 
of information based complexity pl ll7ll30ll37j as well as more recent results in active learning |12] . 
Although this literature considers different observation models (e.g., noise-free observations of non- 
sparse signals), the general theme is that adaptivity is beneficial only in certain regimes (e.g., 
see [22] )• In another direction, we also note that several authors have previously suggested Bayesian 
approaches to adaptive sensing that are highly relevant to the problems we study in this paper, 
but which currently lack much in the way of theoretical justification or understanding [111I231I29] . 

1.3 Constrained sensing 

Up to this point, we have discussed results in which we essentially have complete freedom to design 
both the adaptive and nonadaptive measurements in an optimal fashion (that is, up to a constraint 
on IIAIIj?). However, there are many applications where such freedom does not exist, and there are 
significant constraints on the kind of measurements that we can actually acquire. Such constraints 
arise in various hardware devices inspired by compressive sensing. For example, the single-pixel 
camera [T8| acquires samples of an image by computing inner products with binary patterns. In 
this application we could still utilize adaptive measurements, but they must be binary. In other 
applications, we may be restricted to obtaining point samples of the signal of interest. For example, 
in standard sampling systems we are restricted to individually measuring each signal coefficient over 
time or space. Finally, in tomography and magnetic resonance imaging (MRI), as well as other 

^For example, the compressive binary search procedure proposed in [15] succeeds in finding the location of the 
smallest nonzero entry of amplitude /r with probability 1 — 5 when y? ja^ > 16n log(^ -I- 1)/||A|||.. The result for the 
procedure in [26] is similar. 
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Figure 1: (Left) The median squared error versus the signal dimension n for nonadaptive recovery with 
uniformly random selected measurements (red) and oracle adaptive recovery (black). (Right) The ratio 
(green) of the nonadaptive median squared recovery error to the oracle adaptive median squared recovery 
error versus the signal dimension n, with logn (solid magenta) and n/s (dashed magenta) included for 
reference. 

medical imaging settings, we cannot acquire inner products with arbitrary linear functionals—we 
are limited to Fourier measurements. 

In all of these settings, the measurements are constrained', we still have the flexibility to design 
measurements adaptively, but we can only select measurements from a fixed ensemble of prede¬ 
termined measurements. Thus, the constrained setting will typically preclude the use of any of 
the adaptive sensing algorithms referenced above, and a new approach is required. Specifically, if 
we let Ai C denote the set of candidate measurement vectors, then the constrained adaptive 
sensing problem becomes one of sequentially selecting the rows a* of our sensing matrix from the 
set Ai. In this work, we assume the multiplicity of a particular measurement from Ai is allowed to 
be greater than one; that is, repeated measurements are permitted. For the methods discussed in 
this paper, we will restrict our attention to the case where AI is a finite set. For a majority of our 
discussion and examples, we will focus on the setting where Ai = {fi,f 2 ,... ,fn} consists of rows 
from the Discrete Fourier Transform (DFT) matrix. We stress, however, that we need not require 
|AI| = n in general. 

With the restriction that the measurements be chosen from the DFT ensemble. Figure [T] illus¬ 
trates the large potential difference between a completely nonadaptive sensing scheme, where the 
measurements are selected uniformly at random, and an “oracle” adaptive sensing scheme which 
uses a priori knowledge of the true locations of the nonzeros in a signal to carefully adapt the 
choice of measurement vectors to minimize the expected recovery error using the strategy outlined 
in Section [3l In both cases, the Compressive Sampling Matching Pursuit (CoSaMP) [27] algorithm 
is used for the signal recovery. The mediarH squared error over 200 trials is displayed against the 
signal dimension n. Here, the signal is chosen to have a sparse Haar wavelet decomposition that 
is supported on a tree. The choice of a tree-sparse signal is motivated by the observation that 
natural images typically have a structured sparsity pattern in a wavelet domain due to correlations 
between scales. In these simulations, the noise level = 10“^ is held constant while the nonzero 
coefficients scale as y/n so that the per-measurement SNR is fixed. The number of measurements 

® Note that the median and mean curves exhibit the same overall behavior; however, we display the median error 
across all trials rather than the mean error throughout because the median, being a more robust measure, resulted 
in smoother curves with clearer trends between the methods. 
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taken is set to be m = 0.6n (rounding when necessary). See Section rd.ll for further details regarding 
these simulations. 

It is well-known that the DFT and Haar wavelet transforms are not incoherent, which implies 
that A’Jf should not satisfy the RIP; hence, we would not expect blind nonadaptive sensing to do 
well in this setting. However, Figure [T] does illustrate the large potential for improvement over 
nonadaptive sensing. In this case, the adaptive algorithm can potentially improve the recovery 
error over nonadaptive sensing by roughly a factor of n/s, which represents a substantial gain when 
s <C n. While we will see below that there are also nonadaptive strategies to address the coherence 
of Fourier and Haar which somewhat reduce the gap between adaptive and nonadaptive sensing 
in this case, we believe that this clearly illustrates the potential for adaptive sensing, even in the 
constrained setting. 

1.4 Organization 

The remainder of the paper is organized as follows. In Section [2l we show a simple lower bound 
on the adaptive performance of systems limited to DFT measurements. We then generalize this 
result to the larger class of measurements satisfying the RIP. In both cases, the signal is assumed 
to be sparse in the canonical basis. In Section [3l we give a method for measurement selection based 
on optimal experimental design. In Section 01 we provide simulations in a more realistic setting 
and display numerical results when Fourier measurements are used and the signal is assumed to 
be sparse in the Haar wavelet basis, for both synthetic and realistic signals. We also present 
some analytical justihcation using 1-sparse signals in this constrained adaptive setting. Finally, we 
conclude in Section [5] with a brief discussion. 


2 Lower bounds on adaptive performance 


The main result of this section shows that adaptive sensing cannot offer substantial improvements 
over the nonadaptive scheme when the measurements are restricted to certain specific classes of 
ensembles and the signal is sparse in the canonical basis (i.e., 4' = I). We first consider the Fourier 
ensemble, where the sensing vectors are chosen from the rows of the DFT matrix F E where 

F has entries 

fjk =-^exp{-2^T^/^jk/n) (6) 

for j. A: = 0,1,..., n — 1. In this constrained setting we have the following lower bound. 


Theorem 1. Under the adaptive measurement model of ([2]), where the a* are chosen (potentially 
adaptively and allowing repeated measurements) by selecting rows from the DFT matrix ([6]), we 
have that 

inf sup E||x — x ||2 > —(7) 

^ ||x||o<S ^ 

I|x||2>r 

for any R>0. 


This shows that even using an optimal choice of sensing vectors, the recovery error is still 
proportional to even if we exclude the low-SNR setting (by setting R to be large relative to 

a). This is somewhat reminiscent of the main results of [T] and m, which (in an unconstrained 
setting) establish minimax bounds of the form given in (jj]). However, a key difference is that in 
the unconstrained setting the worst-case error which defines the minimax rate is determined by 
the performance at a certain range of worst-case SNRs. Specifically, these bounds are obtained by 
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constructing a “least favorable prior” where the nonzeros of x are near a specific level0 and thus if 
we were to exclude these challenging x via the restriction that ||a :||2 > -R as in ([7]), the bound in ([1]) 
would be dramatically lower - in particular, the gains shown in Q could be realized [151126] . Thus, 
in a sense Theorem [T] is far more pessimistic than these results since it applies no matter how large 
the SNR - although given the incoherence of the DFT and the canonical bases, perhaps this is not 
that surprising. Finally, we note that for certain values of R it may be possible to obtain a slightly 
stronger version of Theorem [1] (by a logn/loglogn factor) using the techniques in [201 Thm 6.1]. 
We do not pursue these refinements here. 

Proof of Theorem [IJ For any adaptive procedure x, we let F' be the mxn sensing matrix consisting 
of the m adaptively chosen vectors from the rows of F, and let F)^ denote the m x s submatrix of 
F' whose column indices correspond to the indices of the support A of x. Using the rows of F' to 
acquire the measurements as in ([2]), we obtain y = F'x + z = F^^xa + z. It is not difficult to show 
(e.g., see the Appendix of 0) that 

inf sup E||x —x|| 2 >inf sup E||x(F(yx' + z) — x'lj^, 

X ||x||o<s ^ x'eR® 


where x(-) takes values in M*. 

To establish the bound in ([7]) we consider a sequence of least favorable prior distributions on 
x'. The minimax risk is always larger than the Bayes risk under any prior, so this will establish a 
lower bound on the minimax risk. Towards this end, consider the prior on x' where x' W(0,p2l), 
but where the distribution is truncated to be zero for ||x '||2 < R and re-scaled appropriately. Note 
that in the absence of this truncation, the Bayes risk would be given by 




^HF'a) + 


2 


( 8 ) 


where (Tj(F(Y) denotes the singular value of F)^. This follows from the fact that the Bayes 
estimator is given by 

E[x|y] = (F(,^F(, + ^I)-iF(,V 

follows from the fact that for this estimator the expected squared error is given 


The result in 
by 


a 


(F'aVa + 4i)-1F(,'^||| 


which rednces to ([5]) via the application of standard properties of the singnlar valne decomposition. 
We now note that for any R > 0, as ^ oo, the Bayes risk for the truncated prior will converge 
to that of dS]), namely. 


a 






1 


4 ^HF'a) 


Pntting this all together, we have that 


inf sup Ejj 
^ ||x||o<S 
||x||2>iJ 


X — X 


2 

2 





^This threshold is around (mini if)/( t^ ~ (n/m) logs. 
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> fj" 


= a 




ITT' l|2 

1-*^ aIIf 


where the second inequality follows from Jensen’s inequality. Since ||F^||^ 
the proof. 


this completes 
□ 


Our next result generalizes this type of lower bound to any ensemble whose submatrices satisfy 
the RIP with overwhelming probability. This statement is significant because it suggests that in 
some constrained situations, specifically many commonly studied in compressive sensing, there is 
little benefit from adaptivity. Formally, we define an RIP ensemble as follows. 


Definition 1. Let m be fixed. We say that an n x n matrix A with unit-norm rows is an RIP 
ensemble if for any m' > m a random m' x n submatrix A, whose rows are uniformly chosen 
without replacement, satisfies 

0.5— ||n||2 < ||Au||2 < 1.5— ||u||2, (9) 

n n 

for all s-sparse u with probability 1 — exp(—cn) (where c is such that exp(—cn) < Xjln). 

Theorem [2] makes rigorous the claim that selecting rows intelligently from such a matrix yields 
no substantial improvement over a nonadaptive scheme. 


Theorem 2. Under the adaptive measurement model of ([2|), where the are chosen (potentially 
adaptively and allowing repeated measurements) by selecting rows from an RIP ensemble as defined 
above, we have that 

inf sup E||x — x ||2 >(10) 


sup 

|x||o<s 

x||2>i? 




for any R>0. 


We note that one usually anticipates m to be on the order of s log n, in which case this bound 
becomes 


IE||x - x ||2 > 


sn 


Sms log n 


= 


n 


Sm log n 


sa 


which is roughly a factor of log^ n lower than the upper bound in ([T]) . This result shows that the 
recovery error with any adaptive measurements selected from some standard RIP ensemble again 
falls short of the possible gains shown in Q. 

We also note here that the bound in Theorem [2] is worse by a factor of m/s than Theorem [TJ 
However, we believe this is necessary due to the fact that the only assumption we place on A is 
that Q holds with overwhelming probability; this is a much weaker requirement than insisting on 
DFT measurements as in Theorem [H As a motivating example, fix some subset A C {1,... ,n} of 
size s. Construct a matrix A by setting it to the DFT basis F, with its first row modified in the 
following way: on A, multiply each entry by a factor of C where = m/8s and off of A multiply 
each entry by a factor c = \/{n — sC‘^)/{n — s). This yields a matrix A whose rows still have unit 
norm. In addition, one can show that for this new matrix A, the property ([9]) still holds with the 
same probability for <5 = 5/8 for any (m + 1) x re submatrix A. Construct an (m + 1) x re matrix 
A' with the first row of A repeated m — s + 2 times (since at least s rows need to be unique). Then 
one computes that ||A(y|||' = ^(s — 1 + “ ■s + 2)) > rnf jn. On the other hand, any matrix of 

the same size adaptively constructed from the DFT basis F has a squared Frobenius norm equal 
to s(m + l)/re. Thus we may indeed lose an rre/s factor because of this weakened assumption. 
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Proof of Theorem [H Let A' be the mxn matrix of the adaptively selected rows as in the theorem. 
Fix a support set A of size at most s. Let be the restriction of A' to the support set A. We 
will prove the result by showing a bound on the norm of the rows of A^ which we obtain via an 
argument of contradiction. To that end, let a* be the row of A corresponding to the row of A^ 
with the greatest Euclidean norm. Now consider drawing a random (m + 1) x n submatrix A of 
A that contains a* as a row. Then one can compute that any such submatrix A satisfies Q (with 
m' = m + 1) with probability at least 1 — exp(—cn)n/(m + 1) > 1 — exp(—cn)n. Indeed, one sees 
formally that 

P(A does not satisfy ([9]) | a* is a row of A) 

P(A does not satisfy Q and a* is a row of A) 

P(a* is a row of A) 

^ P(A does not satisfy Q) 

P(a* is a row of A) 

^ exp(—cn) 

“ (m + l)/n 


Now let A*^ be the remainder of the matrix, i.e., all rows of A except row a*. Similarly, one 
computes that any such matrix A'^ satisfies Q (with m' = m) with probability at least 1 — 
exp(—cn)n/(n — m) > 1 — exp(—cn)n. Thus both of these matrices satisfy ([9]) with probability at 
least 1 — 2exp(—cn)n > 0. For the sake of a contradiction, suppose that ||a (^||2 > 3m/n. Observe 
that the signal x G M"' where xa = a^ and padded with zeros off of the support A is an s-sparse 
signal. Then since both matrices satisfy Q, we must have that 




> 0.5- 


_m , 


n 


|x||i + ||x||| 


.m 


3m \ 


3.5m, 


> 0.5— +- ||x||^ > 

n n J n 


X 


On the other hand, we must also have that 

II T Il2 / 1 + 1 II ||2 

IIAxllo < 1.5-llxllo- 

n 

Combining these means that which is a contradiction. Thus, it must be that 

llaAlli — 3m/n. Since a^^ is the largest row of A^^, we then have that 


Following the same argument 

inf 


I a; 


|2 

If ^ 


< m a^ 


■All2 ^ 


3m^ 


n 

as in the proof of Theorem [H we thus have that 


sup E||x 

|x||o<S 

|x||2>F 


— X 


^ > 
2 — 


I A ^ 112 
I^aIIf 


u 2 > 


sn 9 
:sa , 


3m2 


which completes the proof. 


□ 
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3 Adaptivity through optimal experimental design 


Although there are some settings where constrained adaptive sensing does not offer substantial 
improvement over the nonadaptive scheme, one can of course ask if there are other settings where 
notable gains are still possible. In order to address this question, we consider the simplified con¬ 
strained adaptive sensing problem where we assume the support A of the signal x (with respect to 
the sparsity basis is known, or some estimate of the support is provided. How would we choose 
the measurements to best make use of this information, while still respecting that the measurements 
are constrained to be from the measurement ensemble Ad? 

Let {aj}(Ai denote a sequence of length m with elements aj G Ad corresponding to the mea¬ 
surements of Ad that are chosenJl Then, denote by A' the m x n matrix (recall Ad C C”') whose 
row is aj. If A = supp(x), then it can be shown by following the arguments in the proof of 
Theorem [1] that the optimal MSE satisfies 

EP-x||2 = ||(A'^A)t|||u2 

= tr(((A'’^^ArA'’^^A)■^)fT^ 

where (A'T^a)^ denotes the Moore-Penrose pseudoinverse of A'T'a, <7 ^ is the variance of the noise 
term as in (j2]), T^a is the submatrix of T' restricted to the columns indexed by A, and A'T'a is 
assumed to have full (column) rank. Our goal is to find a length-m measurement sequence {a*}™^ 
that minimizes m, which is equivalent to solving 

= argmin tr (((A'^^a)*A'T'a)”^) , (12 ) 


where A' = A'({aj}^;^) is constructed as described above. Note that an essentially equivalent way 
to state (fT^ (up to a permutation of the measurements) is via the discrete optimization problem 

S = argmin tr (((AT'a)*SAT'a)~^) 

diagonal matrices S^O , , 

siiez+ (Id) 

subject to tr(S) < m, 


where A is the |Ad| x n matrix containing all possible measurement vectors from Ad and su G Z"*" 
forces each diagonal entry of S to be a non-negative integer (reflecting the multiplicity of each a*). 
Both (I12p and (jlSD reflect the optimization problem that we would ideally like to solve. Unfor¬ 
tunately they are computationally demanding discrete optimization problems; hence, we instead 
consider the relaxation of (jl3ll 

S = argmin tr (((AT'a)*SAT'a)~^) 

diagonal matrices SXO (14) 

subject to tr(S) < m, 


where the constraint tr(S) < m ensures that the resulting “weighted” sensing matrix \/SA satisfies 
the “sensing energy” constraint ||\/SA|||. < m when the rows of A are normalized. Note that 

®We use a sequence of elements from {1,..., |Af |} rather than a subset to emphasize that the m measurements from 
Ad need not be distinct. Note that in the general adaptive setting the order of the measurements is also important; 
however, in the context of this section there is only one batch of adaptive measurements and thus the order within 
this batch has no impact. 
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this is equivalent to the continuous design for the A-optimality criterion studied in the optimal 
experimental design literature [32] . 

Fortunately, ()14p is a convex problem [5] and can be efficiently solved by a number of methods. 
Whereas the problem in (I12[) would tell us which measurements and how many of each to use 
from A4, (1141) instead tells us, through the diagonal matrix S of weights, “how much” of each 
measurement to use. We simply weight each a* by \/s^) where 'su denotes the element on the 
diagonal of S. 

If, on the other hand, we use the measurement model where we must choose m unweighted 
measurements from A4, the practical use of S from (I14p is less obvious. We experimented with 
several different (though likely sub-optimal) approaches to using the weights in S, and the fol¬ 
lowing method empirically seemed to produce the best results. In this work, we use a simple 
sampling scheme to obtain a discrete design. Specifically, we draw exactly m measurements, with 
replacement, according to the probability mass function 


Pi = — 
m 

We guarantee that the resulting matrix A' is at least rank s by rejecting any construction for which 
this constraint is not satisfied. These m measurements then form the rows of the sensing matrix 

A'. 



4 Case study: Fourier measurements of Wavelet sparse signals 

The results of Section [2] demonstrate that adaptive sensing cannot offer substantial improvements 
over nonadaptive sensing for certain classes of measurement ensembles when the signal is sparse in 
the canonical basis. We next explore the case when is instead a wavelet basis and we acquire 
DFT measurements (this is indeed the setting of Figure [U which suggests dramatic potential im¬ 
provements from constrained adaptive sensing). This setting serves as a somewhat idealized model 
for a number of applications in tomography and other medical imaging since physical limitations 
would entail that we can only acquire DFT measurements, and realistic images are generally sparse 
with respect to wavelet bases [U]. In this setting we might receive one DFT measurement at a 
time, and from those, we can (potentially in real time) request the next DFT coefficient to be 
measured. 

For our hrst two sets of experiments, we will assume the sparsity basis T' is the Haar wavelet 
basis. We will denote the n x n discrete Haar wavelet transform by H, with entries hjk for j, k = 
0,1,..., n — 1 and n is assumed to be some power of 2. When j = 0, we have 

hok = (16) 

Vn 

For indices j > 0, we write j = 2^ + q — 1, where p = [log 2 and q are nonnegative integers, and 

2p/2 {q-^)n < ^ (g-0.5)n 

_2p/2 <k<§ (17) 

0 otherwise. 

Since, however, the Haar wavelet basis H is a sparsifying transformation, for a signal (or image) x 
we have that Hx = o:, with ||q:||o < s. This means x = H*q:, where H* denotes the adjoint of H, 
for which H* = H~ since H is unitary. 


hjk — 
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Figure 2: (Large measurement regime) The median squared error versus the number of measurements m 
when the nonzero locations of a. are selected on a sparse tree (left) or uniformly at random (right). The 
nonadaptive (red) and adaptive (blue) recovery is shown when either VDS or uniform sampling is used for 
the nonadaptive measurements, and CoSaMP (top) or t'l-minimization (bottom) is used; the oracle adaptive 
(black) recovery is also included for comparison. 


With this notation in hand and recalling that F is the n x n DFT, m becomes 

E||X-x||i = ||(F'H)()t||V, (18) 

where F' is the mxn sensing matrix consisting of the m adaptively chosen vectors from F and 
is the nx s submatrix of H* restricted to the columns indexed by A = supp(Q;) = supp(Hx). Thus, 
we see that the optimal MSE depends on the correlations of the DFT and Haar basis elements. In 
a similar manner, in our last experiment, where the signal is an MRI image, we will assume the 
sparsity basis ’F is the Daubechies wavelet with 3 vanishing moments (D6). 

We now present a suite of numerical simulations in these settings that employ the relaxation dn 
followed by the sampling scheme described in Section[3]to select a sequence of m DFT measurement 
vectors. We then follow with a short analysis for the simple case of 1-sparse signals. 

4.1 Simulations 

Here we present a practical implementation of adaptive sensing obtained via the relaxation (I14p 
which we then compare with the results of traditional nonadaptive sensing. To implement (1141) . we 
use the Templates for First-Order Conic Solvers (TFOCS) software package OH]. 
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Figure 3: (Small measurement regime) The median squared error versus the number of measurements m 
when the nonzero locations of a are selected on a sparse tree (left) or uniformly at random (right). The 
nonadaptive (red) and adaptive (blue) recovery is shown when either VDS or uniform sampling is used for 
the nonadaptive measurements, and CoSaMP is used; the oracle adaptive (black) recovery is also included 
for comparison. 


For our first two sets of experiments, we set Ad to be the ensemble of n measurements from the 
nxn DFT matrix F. We define x to be a 10-sparse signal in the Haar wavelet basis (i.e., 'F = H*) 
with the values on the support of a distributed i.i.d as M{y/n, 1) and the measurement noise z 
is distributed as i.i.d. AA(0, 10“^)]^ Unless otherwise stated, the signal is of dimension n = 1024. 
We consider signals whose support is chosen uniformly at random, and also those whose support 
obeys a tree structure. Briefly, in the latter case the support is organized on a binary tree, plus 
an extra node at the top. The first scaling (or lowest frequency) coefficient has just one child; 
the second and further wavelet coefficients have two children each. This model is characteristic 
of natural images which tend to have inter-scale correlations (see [131119] for similar wavelet-tree 
constructions). An s-sparse support is filled by choosing the first scaling location, and then in 
each of the s — 1 remaining rounds, choosing one node randomly among the unfilled nodes which 
currently have a chosen parent. 

Nonadaptive sensing. Due to the lack of incoherence between the DFT and Haar bases, it has 
been observed (and recently theoretically shown [MIES]) that so-called Variable-Density Sampliiw 
(VDS) is often preferable to standard uniform random selection of DFT measurements. In VD^, 
sampling can be concentrated on the lower frequencies, producing superior recovery results. We 
test recovery using either ^i-minimization [9l|35l|36] or the greedy pursuit CoSaMP m- 

Adaptive sensing. In the more realistic setting, we employ a simple strategy which uses m/2 
nonadaptive measurements (using either VDS or uniform sampling) to construct an estimate of 
A. This is done by executing either f’l-minimization (followed by thresholding) or CoSaMP. We 
then solve the relaxation (I14p using this estimated support, and the remaining m/2 measurements 
are selected adaptivel 3 {§ using the distribution given by (fT^ . To recover the signal, either ii- 

®We have found the adaptive procedure to be robust to the noise level, and compare similarly to the corresponding 
nonadaptive procedure even for larger noise levels. 

^Following the experiments in [25], we also do not apply any preconditioning to the sensing matrix. 

®Note that these m/2 adaptively selected measurements are only adapted to the first m/2 measurements, but 
are nonadaptive with respect to each other. That is, only one instance of adaptive measurement selection is being 
performed. Although in a different context, a similar two-stage approach is also taken in m- 
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Figure 4: The ratio (green) of the nonadaptive median squared recovery error to the adaptive median 
squared recovery error versus the signal dimension n when the support locations of o: are selected on a sparse 
tree (left) or uniformly at random (right). The ratio is shown when either VDS or uniform sampling is used 
for the nonadaptive measurements, and CoSaMP (solid line) or £i-minimization (dashed line) is used. The 
curves for logu (solid magenta) and n/s (dashed magenta) are included for comparison. 


minimization or CoSaMP is again used to obtain an updated estimate A using all m measurements]^ 

The final signal coefficient estimate is calculated as = (F'Ht )^y, where y is the m-dimensional 

vector of measurements, F' is the m x n vector of DFT measurements selected, and Ht is the 

’ ’ A 


n X 10 submatrix of H* restricted to the columns indexed by A. One could alternatively use the 
signal estimate returned directly from the recovery algorithm, which we have observed to perform 
similarly to (or only slightly worse than) our implemented method. 

Oracle adaptive sensing. For sake of comparison, we also consider the case where the true 
support A of the signal is known a priori, and the measurements are selected as in the adaptive 
sensing case using this A. Recovery is then performed simply by applying the pseudoinverse: 


Sa = (F'HX)ty. 


Figure [2] compares recovery results over 1000 trials for nonadaptive, adaptive, and oracle adap¬ 
tive sensing versus the number of measurements m, where m ranges between 100 and 1000. We 
see that when the signal is supported on a tree, uniform sampling performs poorly for both non¬ 
adaptive and adaptive sensing, as might be expected. The performance of the uniform sampling 
methods can be understood via the empirical observation that in this case we require roughly 500 
measurements before we can reliably estimate the support. When using the CoSaMP algorithm, the 
sudden improvement at m ~ 500 for uniform nonadaptive and at m ~ 1000 for uniform adaptive 
(which uses m ~ 500 measurements for support identification) corresponds to the threshold where 
more than half of the trials resulted in a correct support recovery. In contrast, sampling with VDS 
offers dramatic improvements for both nonadaptive and adaptive sensing with either reconstruction 
algorithm, with adaptive sensing performing almost as well as the oracle. In this case, VDS is al¬ 
ready capturing much of the potential improvement offered by adaptivity because the energy of the 
signal is heavily biased towards the lower frequencies, although adaptivity still results in somewhat 
improved performance. In contrast to the tree-sparse case, when the signal support is selected 
randomly, uniform nonadaptive sampling actually performs better than VDS, whereas adaptive 
sensing performs similarly regardless of the type of nonadaptive measurements taken. Thus if one 


^Using dependent measurements is of course not justified theoretically, but we found unsurprisingly that using all 
m measurements gave better empirical results. 
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is not sure of the signal structure in general, adaptive sensing can offer improvements in either case. 
This flexibility represents one of the main advantages of adaptive sensing. 

Figure [3] studies the same setting as Figure [2] when using CoSaMP for recovery, but focuses 
on the small measurement regime. These results illustrate that there are regions, however narrow, 
where the nonadaptive method can succeed while the adaptive method fails. This is expected due 
to the nature of the adaptive scheme, where only m/2 measurements are utilized to identify a 
support estimate. At some point, the support can be sufficiently estimated with m, but not m/2, 
measurements. For tree-sparse signals, nonadaptive sensing with VDS measurements outperforms 
adaptive sensing with VDS nonadaptive measurements when m ~ 50. For uniformly sparse signals, 
we see this behavior even more clearly for both VDS and uniform sampling. 

The results of our second simulation are shown in Figure 01 where we compare the ratio of 
nonadaptive to adaptive sensing recovery over 200 trials against the dimension n of the signal 
x; the number of measurements used is always m = 0.6n (rounding when necessary). We note 
that since the norms of cx and z both scale with re, the SNR remains roughly the same for all 
signal dimensions re. We observe similar results as Figure O demonstrating the behavior holds as 
a function of dimension. 

In our last experiment, we evaluate our adaptive approach on real images. This scenario differs 
from previous experiments in two key aspects. First, the signal of interest is a two-dimensional (2D) 
image, not a one-dimensional vector, and thus we use 2D DFT measurements and a 2D discrete 
wavelet transform as the sparsity basis. Second, the image is not exactly sparse in any wavelet 
basis. Hence, when estimating the sparse support we introduce an additional (non-Gaussian) source 
of error, the contribution of the off-support wavelet coefficients. We note that the choice of the 
parameter s, which we have not attempted to optimize, can have an impact on signal reconstruction. 

The image we use, brain. mal0, is rescaled to be 64 x 64, and is shown in Figure [3 We use the 
Daubechies wavelet with 3 vanishing moments (D6) in a full 2D decomposition (i.e., log 2 64 = 6 
levels). We set the parameter s = 1000, which we again note was not tuned nor optimized. 
Additionally, we introduce white Gaussian noise at the level of cr = 0.01 to each measurement. 

The experiment proceeds as follows: in the nonadaptive case, rre measurements are taken accord¬ 
ing to VDS. The set of recovered wavelet coefficients are obtained using .^i-minimization and the 
image is reconstructed using the inverse wavelet transform. Note that the output of ^i-minimization 
is not necessarily exactly s-sparse. The assumed sparsity s guides our choice of the £2 error term 
constraint, but we did no thresholding afterwards. We evaluate performance by the median peak 
signal-to-noise ratio (PSNR) in dB over 50 trials. 

In the adaptive case, rre/2 VDS measurements are taken as in the previous nonadaptive case. 
Then, via the ^i-minimization reconstruction, we determine the estimated top s wavelet coefficients 
in each of 50 trials. We choose the trial with accuracy (in terms of the number of correctly identified 
top s wavelet coefficients) closest to the median accuracy. Utilizing the size s support estimate 
identified, we solve the relaxation (|14p and select the remaining rre/2 measurements adaptively. 
Finally, we recover the signal via £i-minimization, and, as before, reconstruct the final wavelet 
coefficients using the pseudoinverse. Again, we evaluate performance by the median PSNR over 50 
trials of the adaptive measurement selection. 

The adaptive and nonadaptive recovered images of the single trial with the closest to median 
PSNR performance using a total of rre = 3000 measurements are given in Figure [3 Notice that the 
PSNR of the adaptive strategy is 28.02 dB, which exceeds the PSNR of 25.03 dB of the nonadaptive 
strategy. Visually, the adaptive strategy more closely resembles the original image. The median 
PSNR as the number of measurements is varied is shown in Figure [3 The plot shows that as the 

^'^Obtained from http://www.eecs.berkeley.edu/~mlustig/CS.html 
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Original 



Nonadaptive, Adaptive, 

PSNR = 25.03 dB. PSNR = 28.02 dB. 


Figure 5: (Top) The 64 x 64 brain.mat image used in our medical imaging experiments. Recon¬ 
structed images with the closest to median PSNR among the 50 trials of nonadaptive (bottom left) 
and adaptive (bottom right) sensing with m = 3000 measurements. 



number of measurements reaches a certain level (roughly above 2000 measurements), the two-stage 
adaptive approach begins to exceed the method which is purely nonadaptive. Hence, as long as 
enough nonadaptive measurements are taken to obtain a sufficient support estimate, the adaptive 
procedure can improve image reconstruction quality. 

We note that adaptive approaches to medical imaging have also been studied using an alternative 
Bayesian model for sampling optimization |281I331[34] . The work |33] studies the optimization of 
sequential sampling over stacks of neighboring image slices. In future work, it would be interesting 
to extend our proposed adaptive sampling scheme to this setting. Our method, however, is a 
framework for general adaptive sensing, not tuned specihcally for medical imaging. 

4.2 Analysis of the 1-sparse case 

We now provide some analytical justification explaining why adaptive sensing can achieve a lower 
MSE than nonadaptive sensing for the Haar wavelet basis with DFT measurements, but show 
that the largest gains are realized for a small fraction of the possible signal support sets. We 
consider the simple case when s = 1 and the support is eventually known (either by oracle or by 
utilizing some method for estimation, as in the above experiments), and use this toy problem as 
motivational justification for the general setting. If we denote the s singular values of F'H^ by 
<71 > <72 > • • • > <7s > 0, then in general we want to minimize 

||(F'HX)t||| = ^ 4 - (19) 

i=l * 

When s = 1, (llhjl becomes ||(F'H^)1|||. = However, minimizing this quantity is the same as 

maximizing ||F'H)^|||, = <7f. It is easy to see that ||F'H)^|||, is maximized when a measurement of 
F that is most correlated with is chosen for every measurement in Fh That is, if such a row 
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Figure 6: The median PSNR versus the number of measurements m over 50 trials of nonadaptive 
and adaptive sensing of the 64 x 64 brain.mat image. 


can be identified, the best way to sense 1-sparse signals adaptively once the support is known is 
to simply repeat that measurement until the number of allotted measurements has been reached. 
Note that F'H^ is m x 1 in this case, and is still full rank when selecting the measurements in this 
way; thus, the theory leading to (jl8p still holds. In this setting, we can determine explicitly what 
the MSE looks like, and provide bounds on the MSE that depend on the support location of the 
1-sparse signal. The result assuming a known support is provided in Theorem [3l The result in the 
more realistic context of adaptive sensing, where the first half of the measurements are selected 
nonadaptively, immediately follows and is provided in Corollary [H 

Theorem 3. Denote by x = H*q: the signal of interest, and suppose x beeomes 1-sparse after 
applying the Haar wavelet transformation H (that is, Hx = o: and ||q:||o = !)■ Let supp(Q:) = A, 
and suppose the support A is eompletely known. Suppose we measure repeatedly with a partieular 
measurement from the n x n DFT F defined in denote this measurement by ij, where j G 
{0,1,... , n — 1} is some row index. Then, our observations are of the form 


yi = {ij,U.\aA) + Zi, ( 20 ) 

for i = 1,... ,m, where the noise Zi are i.i.d. N{0, a‘^). Then the MSE is given by 

1/m 


IE||x-x ||2 = 




zcr 


and is bounded by 


^<E\\x-x\\l<^, 
m 2m 


( 21 ) 


( 22 ) 


where the expectation is taken with respect to z. 


Note that in standard compressive sensing when we rely on the RIP, the DFT matrix F is 
normalized by rather than If we make this normalization, the bound in (I22p becomes 


^<E||S-x||i<T, 


Including pre-conditioning and other scalings of course yields an analogous bound. 
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The proof of Theorem [3] relies on three lemmas that provide bounds for the term | (fj,H^) 
appearing in the MSE in (I2ip . Specifically, one term of interest is 


min max I 

Ae{0,...,n-1} 

The maximization over j corresponds to selecting the best DFT measurement fj, and the mini¬ 
mization accounts for the worst case signal (i.e., the worst case support A). On the other hand, we 
also want to obtain a value that represents the best case signal so we are also interested in 



max max 


(24) 


Before proving Theorem [3l let us set some notation. The Haar wavelet transform matrix 
H, defined in (jl6|) and ()17|) consists of blocks of consecutive rows with the same nonzero entry 
magnitudes. Let 1 < a < log 2 n denote the block of H, where a = 1 corresponds to the | rows 
indexed by j = ^,..., n — 1 (i.e., the “bottom” half of H), a = 2 corresponds to the ^ rows indexed 
byj = § — — 1, and so on. Similarly, for H*, instead of blocks of rows, we have blocks of 

columns', the block corresponding to a = log 2 n represents the lowest frequency wavelets, and the 
block corresponding to a = 1 represents the highest frequency wavelets. 


Proof of r/ieorem 0 This proof requires the following three lemmas. Lemmas [U and [2] are used to 
prove Lemma [3l and Lemma [3] is used to complete the proof of Theorem [3l The lemmas can be 
derived using elementary trigonometric bounds, and we omit the proofs here. 

Lemma 1. Fix j G Z where 1 < j < n — 1 and let a = 1,, log 2 n. Choose /cGZ, 0</c<n — 
Then 


fc+V-1 


g-27rij(j/n 

~ \ 

q=k 

\ 


l-cos(g^) ^ 

1 -cos(^) 


(25) 


Lemma 2. Let ij, j G {0,... ,n — 1}, be row j from the n x n DFT and let be the inverse 
discrete Haar wavelet transform restricted to the column indexed by A. Let a = 1,..., log 2 n denote 
the block o/H* and let A G {1, 2,... , n — 1}, |A| = 1, be a column in the set corresponding to block 
a. Then, 




1 


1 — cos( 


2°7r.7A 
n ^ 



where j = l,...,n — 1. When j = 0, 




1 A = 0 

0 AG{l,...,n-l}. 


When A = {0}, 




1 j = 0 

0 j = 1,2,... ,n-1. 


(26) 
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Lemma 3. Let fj, j G {0,... ,n — 1}, be a row from the n x n DFT matrix F and let 
he the inverse discrete Haar wavelet transform restricted to the column indexed by A. Let A G 
{0,1,... , n — 1} so that |A| = 1. Then 


min 


max I (fj,HA) 1 = W- 


and 


A£{0,...,n—1} 1} 


max max I (L^Ha) I = 1. 


Since |A| = 1, qa is just a scalar, so that the measurements (I20p can be written as 

Vi = (f;',HA) CX^+Zi. 

This can be concisely written as 


(27) 

(28) 


(29) 


y = Aqa + z, 


(30) 


where A is the m-dimensional column vector with each entry equal to (fj, H^). To estimate qa, we 
apply A1 to y. In this case, A^ is an m-dimensional row vector, with each entry equal to — , ^ , . 


Therefore, 


Aa = AV = A^’(Aq:a + : 







( 

1 

*< 

K ... 

1_ 

«A + Z 


[ 

1 

K 

1_ 

) 


- m 

Ay (a^+ ==■ ) =„^ + iy^L 


Using this, and since YllLi ~ -^(0, we find 

EP-x|| 2 =IE||H*(a-«)||i 

= E||a - cx\\l = EISa - QaI^ 


= E 


= E 


- m 


aA 


^ m 

-E 

m 




1/m 


1/m 

ThT 


E 


E 

2=1 


mcj = 


1/m 9 

:cr • 


(fj.Hl) 


Applying the bounds from Lemma [3] to (I3ip . we arrive at 

< E||x — x ||2 < 


m 


2m ’ 


as desired. 


(31) 


□ 
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Corollary 1. Suppose x = H*q: is 1-sparse. Suppose after y nonadaptive DFT measurements, the 
support A is correctly identified. For the remaining y DFT measurements, we measure repeatedly 
with a particular measurement from the n x n DFT F; denote this measurement by ij, where j G 


{0,1,... , re —1} is some row 
where the noise Zi are i.i.d. 

index. Then our observations are of the form ^2Uf) for i = y -|-1,. 
^"(0, cr^). Then the MSE is given by 

..,m. 



E||5 ^\\i= 

(32) 

and is bounded by 


m~ m 

(33) 


where the expectation is taken with respect to z. 


The upper bound on E||x—x||| in Corollary [T] is precisely the lower bound from Theorem [T] when 
s = 1. This means that there is indeed some room for improvement with adaptive sensing when 
the sparsity basis is the Haar wavelet transform rather than the canonical basis. Corollary [1] shows 
that the performance of adaptive sensing, in terms of the MSE, depends on the support location of 
1-sparse signals. The best adaptive recovery is possible when the support is located on the lowest 
wavelet frequency (A = {0}, or the first Haar wavelet coefficient) while the worst recovery occurs 
when the support is located on any of the higher wavelet frequencies in block a = 1 (the latter half 
of the Haar wavelet coefficients). This of course matches the intuition based on the correlations in 
these two bases. This suggests that structured signals such as those that are tree-sparse will benefit 
more from adaptivity than signals that have a uniformly distributed support. 

In light of the discrepancy between (j27p and (1281) . one wishes to know in some sense, what frac¬ 
tion of signals allow for recovery more like one versus the other. Figure[3shows how maXjg|o^ | (fj, H)() 

varies by maximizing maXjg|o^ | (fj, H^) | over A while successively removing blocks from H*. 

Using our notation for blocks, the blocks of H* are removed in the following (top-down) order: 
log 2 (n), log 2 (n) — 1,... , 1. Then, we plot the value of maXjg{o,,,,,n-i} I | for the remaining 

submatrix of H*. Hence, we see that the MSE is higher for signals supported on higher wavelet fre¬ 
quencies, and the upper bound of (l38]) is achieved by exactly half of the possible signal support sets, 
whereas the lower bound of (1331) is achieved by exactly one of the possible signal supports sets (i.e., 

A = {0}). Fortunately, the support of natural images tends to be concentrated on lower-frequency 
wavelet coefficients [3T] . 

5 Discussion 

Adaptive sensing has tremendous potential to improve the accuracy of sparse recovery in a variety of 
settings. However, in many practical applications one does not have the freedom to choose arbitrary 
measurement vectors, but instead must choose from a specified pool of measurements. One example 
of particular interest is the setting where measurements must be taken from the Fourier ensemble, 
as is the case in many medical imaging applications. In this paper we established fundamental 
limitations on the improvements offered by adaptivity in this setting for certain sparsity bases. On 
the other hand, we argued that for other sparsity bases (such as the Haar wavelet basis) the role 
of adaptivity in the constrained setting is much less straightforward. We developed a sampling 
scheme which uses a simple optimization procedure to select measurements adapted to the signal 
support. This scheme results in significant improvements once an accurate estimate of the support 
is obtained, which in practice can be achieved by first dedicating a portion of the measurements 
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Figure 7: The value of maXjg|o,...,n-i} I | is displayed against the (log of the) signal dimen¬ 

sion n = 2P. The solid magenta curve shows the optimization when minimizing over all possible 
supports A G {0,..., n — 1}, given by (l271l . The solid red curve shows the opposite optimization 
when maximizing over all possible supports A G — 1}, given by (j28p . The remaining 

dashed curves show the optimization when maximizing over all supports A except those in the 
blocks indicated. 


to support estimation. Though this approach is not necessarily provably optimal, it nonetheless 
demonstrates the potential of adaptive sensing in the constrained setting. We believe future work 
in this area can further the understanding of both the limitations of this approach as well as the 
potential benefits. 
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